-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sgdisk: Run partx after partition changes #1717
Conversation
The sgdisk tool does not update the kernel partition table in contrast to other similar tools. Often udev can detect the changes but not always as experienced when adding a new partition on Flatcar's boot disk. Instead of implicitly relying on some other component to re-read the kernel partition table, trigger the re-read with partprobe. This is proposed in coreos/ignition#1717
8b75dab
to
78efdc2
Compare
This adds coreos/ignition#1717 as downstream patch to fix flatcar/Flatcar#1194
This adds coreos/ignition#1717 as downstream patch to fix flatcar/Flatcar#1194
This adds coreos/ignition#1717 as downstream patch to fix flatcar/Flatcar#1194
The sgdisk tool does not update the kernel partition table in contrast to other similar tools. Often udev can detect the changes but not always as experienced when adding a new partition on Flatcar's boot disk. Instead of implicitly relying on some other component to re-read the kernel partition table, trigger the re-read with partprobe. This is proposed in coreos/ignition#1717
This adds coreos/ignition#1717 as downstream patch to fix flatcar/Flatcar#1194
The sgdisk tool does not update the kernel partition table in contrast to other similar tools. Often udev can detect the changes but not always as experienced when adding a new partition on Flatcar's boot disk. Instead of implicitly relying on some other component to re-read the kernel partition table, trigger the re-read with partprobe. This is proposed in coreos/ignition#1717
This adds coreos/ignition#1717 as downstream patch to fix flatcar/Flatcar#1194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this makes sense to me and may actually fix #1729. Some comments but the idea overall LGTM.
docs/release-notes.md
Outdated
@@ -15,10 +15,12 @@ nav_order: 9 | |||
|
|||
### Changes | |||
|
|||
- The Dracut module now installs partprobe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer we use blockdev --rereadpt
which is part of util-linux and more likely to be already available (e.g. FCOS doesn't ship partprobe
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this and it doesn't achieve the same thing because it can fail with BLKRRPART: Device or resource busy
while partprobe
succeeds in forcing a kernel partition re-read. If you know that you don't have this problem you could use the introduced command parameter to run blockdev instead, does this work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this requires a wrapper to pass the --rereadpt
argument. Your initrd could ship something like that as blockdev-rereadpt
and then you configure it as partprobeCmd:
#!/bin/sh
exec blockdev --rereadpt "$@"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this and it doesn't achieve the same thing because it can fail with
BLKRRPART: Device or resource busy
whilepartprobe
succeeds in forcing a kernel partition re-read.
Not sure why it's hitting EBUSY
. The common case would be if one of the filesystems on that disk is mounted, but since we were just able to modify the partition table, that shouldn't be the case. Possibly we're racing with udev?
And actually on that topic, would it work here to play the same trick as #1319 and emit a tagged event against the disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, one of the filesystems is mounted and this happens when adding a new partition to the boot disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so what works is partx -u - /dev/vda
and this should have the same effect as partprobe which also adds the individual partitions to the kernel. Since partx should be available everywhere where util-linux is, should I change the PR to use partx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that sgdisk
doesn't do that itself while sfdisk
does - is there a big reason to use sgdisk
?
Anyway, since it doesn't hurt to call partx
I would suggest to use it here and can update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that
sgdisk
doesn't do that itself whilesfdisk
does - is there a big reason to usesgdisk
?
I'm not sure. It seems like sfdisk didn't support GPT in the past so that may be related, but I didn't check the timelines. Regardless, changing it at this point is not worth the risk. The implementation is quite tied to sgdisk
intricacies.
Anyway, since it doesn't hurt to call
partx
I would suggest to use it here and can update the PR.
Before we do that, did you try out this suggestion:
And actually on that topic, would it work here to play the same trick as #1319 and emit a tagged event against the disk?
?
That'd be my preferred solution if it works.
I know you're aware, but for posterity: the difference with partx
(and partprobe
) vs blockdev --rereadpt
is that they manually parse the table and tell the kernel about it instead of letting the kernel do it (BLKPG
vs BLKRRPART
). While there shouldn't be a perceptible difference, I'd rather we keep relying on the kernel and not make the first boot special. (And of course, it avoids another dependency.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And actually on that topic, would it work here to play the same trick as #1319 and emit a tagged event against the disk?
This here is not a race, it's really because the BLKRRPART
ioctl on the block fails when one partition is mounted (E.g., the ESP or root partition). In this case, to my understanding, the partitioning program should add the partition objects one by one with BLKPG
(I would argue that normally this is even the preferred way because it avoids the full re-read and async waiting logic needed when wanting to access the created partition).
I haven't tried if supplying sfdisk
as sgdisk
command would work but anyway, I think we need to work around the deficiencies of sgdisk
and run partx
/partprobe
from Ignition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
udevadm trigger --settle
does not result in the new partition showing up, maybe that wasn't clear from my previous comments
(After a reboot, the new partition shows up without problems of course)
OK, I think I understand better now.
Now, is that by design (e.g. you're knowingly mounting filesystems before the disks stage runs), or a bug (e.g. something is racing, maybe udev, and probing filesystems between the time |
Correct, it's by design because users want to add a new partition on the boot disk where in our case |
Gotcha. And did your initrd change recently to do this mounting earlier than it used to? I'm trying to understand how this ever worked before. I don't think there's a udev rule that will call out to Thinking on this, it's not clear to me whether we should even try to support this (partitioning a busy disk). It makes Ignition less deterministic since depending on the config, it may or may not be able to update the kernel. Though if we can ensure that we correctly error out in the latter case, that'd be OK I guess... I'll try to gather thoughts from a few others. |
Our initrd uses the |
I'm OK with this I think if we can confirm that Ignition will correctly error out if we try to nuke/modify partitions that are currently busy (i.e. with mounted filesystems). In other words, the invariant here is that the loaded partition table must always match the on-disk state at the end of the operation. And fail if that invariant cannot hold. |
This is orthogonal: currently sgdisk will not error out if you remove or resize a partition in use. Detecting this is also not easy because it's not only mounted filesystems but also about device mapper usage.
Yes, that's the aim of this PR, as this wasn't upheld before. |
I created a new issue for this: #1745 |
Let me reword my comment a different way: I think the fact that partitioning a busy disk "additively" worked was a happy accident and not a design choice. No downstream distro until now needed it. This PR is making it a design choice, and I'm OK with that, but then we should try to implement it properly. I had hoped that Re. #1745, I think a simpler way to check for whether the partition is busy is to check if the As a bonus, checking this upfront means that we can call Also, it looks like
|
It's ready for review. I've tested it with Flatcar's kola suite and also manually: adding a partition on the boot device, updating a partition (resize), and deleting, and also that modifications of partitions in-use is producing an error and won't be touching the data. |
Thank you so much for going through the testing. I will try and take a look today. When you can, please make sure to re-base, as some of our CI has been updated , and will prevent a merge without it. Thank you again for your patience and diligence. |
Thanks for the note, I've rebased it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, I have a few asks and nits if you would not mind taking a look.
Nothing is blocking minus release notes.
I've added a review request from myself as well. I should be able to review it soon too. |
1b80a4c
to
fb2c16a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor details, but this looks sane overall. Thanks!
return false, fmt.Errorf("failed to resolve %q: %v", blockDev, err) | ||
} | ||
|
||
mounts, err := os.Open("/proc/mounts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit heavy to parse /proc/mounts
on every block device. Ideally, we'd do this once. In practice, I don't think it matters too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there shouldn't be millions of mount points in the initrd - if that ever is the case it can be cached.
I'll try to have a look next week (edit: in the next months…). |
0c8f970
to
be36ed8
Compare
The patches work well in Flatcar PR where a backported them to replace the first patch posted here. Tested three cases, the overwriting of USR-A was detected and an error thrown, the other two changes passed without problem because the USR-B is unused and can be overwritten and the new partition is also ok to add (confirmed with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, thank you again for being so diligent. @jlebon wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but looks great overall!
When a partition or the whole disk is in use, sgdisk should not execute the destructive operation. Add a check that errors out when a disk in use or a partition in use is to be destroyed.
The sgdisk tool does not update the kernel partition table with BLKPG in contrast to other similar tools but only uses BLKRRPART which fails as soon as one partition of the disk is mounted. Update the kernel partition table with partx when we know that a partition of the disk is in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all your work and patience on this!
disks: Refuse to modify disks/partitions in use
When a partition or the whole disk is in use, sgdisk should not execute
the destructive operation.
Add a check that errors out when a disk in use or a partition in use is
to be destroyed.
sgdisk: Run partx after partition changes
The sgdisk tool does not update the kernel partition table with BLKPG in
contrast to other similar tools but only uses BLKRRPART which fails as
soon as one partition of the disk is mounted.
Update the kernel partition table with partx when we know that a
partition of the disk is in use.